371 research outputs found

    Cost-Effective HITs for Relative Similarity Comparisons

    Full text link
    Similarity comparisons of the form "Is object a more similar to b than to c?" are useful for computer vision and machine learning applications. Unfortunately, an embedding of nn points is specified by n3n^3 triplets, making collecting every triplet an expensive task. In noticing this difficulty, other researchers have investigated more intelligent triplet sampling techniques, but they do not study their effectiveness or their potential drawbacks. Although it is important to reduce the number of collected triplets, it is also important to understand how best to display a triplet collection task to a user. In this work we explore an alternative display for collecting triplets and analyze the monetary cost and speed of the display. We propose best practices for creating cost effective human intelligence tasks for collecting triplets. We show that rather than changing the sampling algorithm, simple changes to the crowdsourcing UI can lead to much higher quality embeddings. We also provide a dataset as well as the labels collected from crowd workers.Comment: 7 pages, 7 figure

    Overcomplete steerable pyramid filters and rotation invariance

    Get PDF
    A given (overcomplete) discrete oriented pyramid may be converted into a steerable pyramid by interpolation. We present a technique for deriving the optimal interpolation functions (otherwise called 'steering coefficients'). The proposed scheme is demonstrated on a computationally efficient oriented pyramid, which is a variation on the Burt and Adelson (1983) pyramid. We apply the generated steerable pyramid to orientation-invariant texture analysis in order to demonstrate its excellent rotational isotropy. High classification rates and precise rotation identification are demonstrated

    Enhanced decoding for the Galileo S-band mission

    Get PDF
    A coding system under consideration for the Galileo S-band low-gain antenna mission is a concatenated system using a variable redundancy Reed-Solomon outer code and a (14,1/4) convolutional inner code. The 8-bit Reed-Solomon symbols are interleaved to depth 8, and the eight 255-symbol codewords in each interleaved block have redundancies 64, 20, 20, 20, 64, 20, 20, and 20, respectively (or equivalently, the codewords have 191, 235, 235, 235, 191, 235, 235, and 235 8-bit information symbols, respectively). This concatenated code is to be decoded by an enhanced decoder that utilizes a maximum likelihood (Viterbi) convolutional decoder; a Reed Solomon decoder capable of processing erasures; an algorithm for declaring erasures in undecoded codewords based on known erroneous symbols in neighboring decodable words; a second Viterbi decoding operation (redecoding) constrained to follow only paths consistent with the known symbols from previously decodable Reed-Solomon codewords; and a second Reed-Solomon decoding operation using the output from the Viterbi redecoder and additional erasure declarations to the extent possible. It is estimated that this code and decoder can achieve a decoded bit error rate of 1 x 10(exp 7) at a concatenated code signal-to-noise ratio of 0.76 dB. By comparison, a threshold of 1.17 dB is required for a baseline coding system consisting of the same (14,1/4) convolutional code, a (255,223) Reed-Solomon code with constant redundancy 32 also interleaved to depth 8, a one-pass Viterbi decoder, and a Reed Solomon decoder incapable of declaring or utilizing erasures. The relative gain of the enhanced system is thus 0.41 dB. It is predicted from analysis based on an assumption of infinite interleaving that the coding gain could be further improved by approximately 0.2 dB if four stages of Viterbi decoding and four levels of Reed-Solomon redundancy are permitted. Confirmation of this effect and specification of the optimum four-level redundancy profile for depth-8 interleaving is currently being done

    Enhanced decoding for the Galileo low-gain antenna mission: Viterbi redecoding with four decoding stages

    Get PDF
    The Galileo low-gain antenna mission will be supported by a coding system that uses a (14,1/4) inner convolutional code concatenated with Reed-Solomon codes of four different redundancies. Decoding for this code is designed to proceed in four distinct stages of Viterbi decoding followed by Reed-Solomon decoding. In each successive stage, the Reed-Solomon decoder only tries to decode the highest redundancy codewords not yet decoded in previous stages, and the Viterbi decoder redecodes its data utilizing the known symbols from previously decoded Reed-Solomon codewords. A previous article analyzed a two-stage decoding option that was not selected by Galileo. The present article analyzes the four-stage decoding scheme and derives the near-optimum set of redundancies selected for use by Galileo. The performance improvements relative to one- and two-stage decoding systems are evaluated

    Many-to-Many Graph Matching: a Continuous Relaxation Approach

    Get PDF
    Graphs provide an efficient tool for object representation in various computer vision applications. Once graph-based representations are constructed, an important question is how to compare graphs. This problem is often formulated as a graph matching problem where one seeks a mapping between vertices of two graphs which optimally aligns their structure. In the classical formulation of graph matching, only one-to-one correspondences between vertices are considered. However, in many applications, graphs cannot be matched perfectly and it is more interesting to consider many-to-many correspondences where clusters of vertices in one graph are matched to clusters of vertices in the other graph. In this paper, we formulate the many-to-many graph matching problem as a discrete optimization problem and propose an approximate algorithm based on a continuous relaxation of the combinatorial problem. We compare our method with other existing methods on several benchmark computer vision datasets.Comment: 1

    A robust braille recognition system

    Get PDF
    Braille is the most effective means of written communication between visually-impaired and sighted people. This paper describes a new system that recognizes Braille characters in scanned Braille document pages. Unlike most other approaches, an inexpensive flatbed scanner is used and the system requires minimal interaction with the user. A unique feature of this system is the use of context at different levels (from the pre-processing of the image through to the post-processing of the recognition results) to enhance robustness and, consequently, recognition results. Braille dots composing characters are identified on both single and double-sided documents of average quality with over 99% accuracy, while Braille characters are also correctly recognised in over 99% of documents of average quality (in both single and double-sided documents)

    Word matching using single closed contours for indexing handwritten historical documents

    Get PDF
    Effective indexing is crucial for providing convenient access to scanned versions of large collections of historically valuable handwritten manuscripts. Since traditional handwriting recognizers based on optical character recognition (OCR) do not perform well on historical documents, recently a holistic word recognition approach has gained in popularity as an attractive and more straightforward solution (Lavrenko et al. in proc. document Image Analysis for Libraries (DIAL’04), pp. 278–287, 2004). Such techniques attempt to recognize words based on scalar and profile-based features extracted from whole word images. In this paper, we propose a new approach to holistic word recognition for historical handwritten manuscripts based on matching word contours instead of whole images or word profiles. The new method consists of robust extraction of closed word contours and the application of an elastic contour matching technique proposed originally for general shapes (Adamek and O’Connor in IEEE Trans Circuits Syst Video Technol 5:2004). We demonstrate that multiscale contour-based descriptors can effectively capture intrinsic word features avoiding any segmentation of words into smaller subunits. Our experiments show a recognition accuracy of 83%, which considerably exceeds the performance of other systems reported in the literature

    Real-Time Hand Shape Classification

    Full text link
    The problem of hand shape classification is challenging since a hand is characterized by a large number of degrees of freedom. Numerous shape descriptors have been proposed and applied over the years to estimate and classify hand poses in reasonable time. In this paper we discuss our parallel framework for real-time hand shape classification applicable in real-time applications. We show how the number of gallery images influences the classification accuracy and execution time of the parallel algorithm. We present the speedup and efficiency analyses that prove the efficacy of the parallel implementation. Noteworthy, different methods can be used at each step of our parallel framework. Here, we combine the shape contexts with the appearance-based techniques to enhance the robustness of the algorithm and to increase the classification score. An extensive experimental study proves the superiority of the proposed approach over existing state-of-the-art methods.Comment: 11 page

    Face analysis using curve edge maps

    Get PDF
    This paper proposes an automatic and real-time system for face analysis, usable in visual communication applications. In this approach, faces are represented with Curve Edge Maps, which are collections of polynomial segments with a convex region. The segments are extracted from edge pixels using an adaptive incremental linear-time fitting algorithm, which is based on constructive polynomial fitting. The face analysis system considers face tracking, face recognition and facial feature detection, using Curve Edge Maps driven by histograms of intensities and histograms of relative positions. When applied to different face databases and video sequences, the average face recognition rate is 95.51%, the average facial feature detection rate is 91.92% and the accuracy in location of the facial features is 2.18% in terms of the size of the face, which is comparable with or better than the results in literature. However, our method has the advantages of simplicity, real-time performance and extensibility to the different aspects of face analysis, such as recognition of facial expressions and talking

    Topological descriptors for 3D surface analysis

    Full text link
    We investigate topological descriptors for 3D surface analysis, i.e. the classification of surfaces according to their geometric fine structure. On a dataset of high-resolution 3D surface reconstructions we compute persistence diagrams for a 2D cubical filtration. In the next step we investigate different topological descriptors and measure their ability to discriminate structurally different 3D surface patches. We evaluate their sensitivity to different parameters and compare the performance of the resulting topological descriptors to alternative (non-topological) descriptors. We present a comprehensive evaluation that shows that topological descriptors are (i) robust, (ii) yield state-of-the-art performance for the task of 3D surface analysis and (iii) improve classification performance when combined with non-topological descriptors.Comment: 12 pages, 3 figures, CTIC 201
    corecore